{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
    "        <br>\n",
    "        <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg?__hstc=259489365.a667dfafcfa0169c8aee4178d115dc81.1733501603539.1733501603539.1733501603539.1&__hssc=259489365.1.1733501603539&__hsfp=3822854628&submissionGuid=381a0676-8f38-437b-96f2-fc10875658df#/shared-invite/email\">Community</a>\n",
    "    </p>\n",
    "</center>\n",
    "<h1 align=\"center\">Tracing and Evaluating an Amazon Bedrock Agent</h1>\n",
    "\n",
    "In this tutorial, you will:\n",
    "\n",
    "- Build an Amazon Bedrock agent\n",
    "- Instrument and trace the agent with Phoenix\n",
    "- Add evaluations to your agent traces\n",
    "\n",
    "## Background\n",
    "\n",
    "[Amazon Bedrock Agents](https://aws.amazon.com/bedrock/agents/) is a fully managed capability in Amazon Bedrock that allows you to build AI agents that can complete tasks by interacting with enterprise systems, data sources, and APIs. These agents can understand user requests in natural language, break down complex tasks into steps, retrieve relevant information, and take actions to fulfill user requests. With Bedrock Agents, you can create conversational assistants that can answer questions, provide recommendations, and perform actions on behalf of users, all while maintaining context throughout the conversation.\n",
    "\n",
    "In this tutorial, we'll use Phoenix to trace and evaluate an Amazon Bedrock Agent, providing visibility into how the agent processes requests, makes decisions, and interacts with various systems to complete tasks.\n",
    "\n",
    "Let's get started!\n",
    "\n",
    "ℹ️ This notebook requires an AWS account with access to Bedrock."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q uv\n",
    "!uv pip install -q arize-phoenix-otel boto3 anthropic openinference-instrumentation-bedrock"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "from getpass import getpass\n",
    "\n",
    "import boto3\n",
    "import nest_asyncio\n",
    "\n",
    "from phoenix.otel import register\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set Phoenix Environment Variables\n",
    "\n",
    "This example used [Phoenix Cloud](https://app.phoenix.arize.com), our free online hosted version of Phoenix. If you'd prefer, you can [self-host Phoenix](https://arize.com/docs/phoenix/self-hosting) instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = \"https://app.phoenix.arize.com\"\n",
    "if not os.environ.get(\"PHOENIX_CLIENT_HEADERS\"):\n",
    "    os.environ[\"PHOENIX_CLIENT_HEADERS\"] = \"api_key=\" + getpass(\"Enter your Phoenix API key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to Phoenix\n",
    "Now you can connect your notebook to a Phoenix instance.\n",
    "\n",
    "The `auto_instrument` flag below will search your environment for any openinference-instrumentation packages, and call any that are found. Because you installed the openinference-instrumentation-bedrock library, any calls you make to Bedrock or Bedrock agents will be automatically instrumented and sent to Phoenix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project_name = \"Amazon Bedrock Agent Example\"\n",
    "\n",
    "tracer_provider = register(project_name=project_name, auto_instrument=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure your agent in Bedrock Agents\n",
    "\n",
    "Within [Bedrock Agents](https://us-east-2.console.aws.amazon.com/bedrock/home?region=us-east-2#/overview), create a new agent and configure it however you'd like. This example uses:\n",
    "1. A knowledgebase created using the webscraper tool.\n",
    "2. A set of action group functions that retrieve information about Phoenix.\n",
    "\n",
    "Bedrock Agents additionally supports Guardrails, Prompts, and more - all of which will be traced by Phoenix."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect your notebook to AWS\n",
    "\n",
    "You'll next need to create an AWS SSO profile that can connect to Bedrock agents. You can do this via the CLI using `aws configure sso`. Once you've run through that setup and created your agent in Bedrock Agents, fill in the variables below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SSO Profile Configuration\n",
    "PROFILE_NAME = \"phoenix\"  # The name of the AWS SSO profile you created\n",
    "REGION = \"us-east-2\"  # The region where your Bedrock agent is deployed\n",
    "SERVICE_NAME = \"bedrock-agent-runtime\"  # The service name of your Bedrock agent\n",
    "\n",
    "# Bedrock Agent Configuration\n",
    "AGENT_ID = \"\"  # The ID of your Bedrock agent, found in the Bedrock Agents console\n",
    "AGENT_ALIAS_ID = \"\"  # The alias ID of your Bedrock agent, found in the Bedrock Agents console"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = boto3.Session(profile_name=PROFILE_NAME)\n",
    "bedrock_agent_runtime = session.client(SERVICE_NAME, region_name=REGION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run your Agent\n",
    "You're now ready to run your Bedrock Agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def run(input_text):\n",
    "    session_id = f\"default-session1_{int(time.time())}\"\n",
    "\n",
    "    attributes = dict(\n",
    "        inputText=input_text,\n",
    "        agentId=AGENT_ID,\n",
    "        agentAliasId=AGENT_ALIAS_ID,\n",
    "        sessionId=session_id,\n",
    "        enableTrace=True,\n",
    "    )\n",
    "    response = bedrock_agent_runtime.invoke_agent(**attributes)\n",
    "\n",
    "    # Stream the response\n",
    "    for _, event in enumerate(response[\"completion\"]):\n",
    "        if \"chunk\" in event:\n",
    "            print(event)\n",
    "            chunk_data = event[\"chunk\"]\n",
    "            if \"bytes\" in chunk_data:\n",
    "                output_text = chunk_data[\"bytes\"].decode(\"utf8\")\n",
    "                print(output_text)\n",
    "        elif \"trace\" in event:\n",
    "            print(event[\"trace\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run(\"Tell me about my recent Phoenix traces\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run(\"How do I run evaluations in Arize Phoenix?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run(\"Tell me about my recent Phoenix experiments\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View your Traces in Phoenix\n",
    "\n",
    "You should now be able to see traces in your Phoenix dashboard:\n",
    "\n",
    "![phoenix-bedrock-agent-traces-1](https://storage.googleapis.com/arize-phoenix-assets/assets/images/bedrock-agent-traces-1.png)\n",
    "![phoenix-bedrock-agent-traces-2](https://storage.googleapis.com/arize-phoenix-assets/assets/images/bedrock-agent-traces-2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating your Agent\n",
    "\n",
    "Phoenix also includes built in LLM evaluations and code-based experiment testing. In this next section, you'll add Agent tool calling evaluations to your traces.\n",
    "\n",
    "Up until now, you'd just used the lighter-weight Phoenix OTEL tracing library. To run evals, you'll need to install the full library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q arize-phoenix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "import phoenix as px\n",
    "from phoenix.evals import (\n",
    "    TOOL_CALLING_PROMPT_RAILS_MAP,\n",
    "    TOOL_CALLING_PROMPT_TEMPLATE,\n",
    "    BedrockModel,\n",
    "    llm_classify,\n",
    ")\n",
    "from phoenix.trace.dsl import SpanQuery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = (\n",
    "    SpanQuery()\n",
    "    .where(\n",
    "        # Filter for the `LLM` span kind.\n",
    "        # The filter condition is a string of valid Python boolean expression.\n",
    "        \"span_kind == 'LLM'\",\n",
    "    )\n",
    "    .select(\n",
    "        question=\"input.value\",\n",
    "        outputs=\"output.value\",\n",
    "    )\n",
    ")\n",
    "trace_df = px.Client().query_spans(query, project_name=project_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply JSON parsing to each row of trace_df.input.value\n",
    "trace_df[\"question\"] = trace_df[\"question\"].apply(\n",
    "    lambda x: json.loads(x).get(\"messages\", [{}])[0].get(\"content\", \"\") if isinstance(x, str) else x\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to extract tool call names from the output\n",
    "def extract_tool_calls(output_value):\n",
    "    tool_calls = []\n",
    "    try:\n",
    "        o = json.loads(output_value)\n",
    "\n",
    "        # Check if the output has 'content' which is a list of message components\n",
    "        if \"content\" in o and isinstance(o[\"content\"], list):\n",
    "            for item in o[\"content\"]:\n",
    "                # Check if this item is a tool_use type\n",
    "                if isinstance(item, dict) and item.get(\"type\") == \"tool_use\":\n",
    "                    # Extract the name of the tool being called\n",
    "                    tool_name = item.get(\"name\")\n",
    "                    if tool_name:\n",
    "                        tool_calls.append(tool_name)\n",
    "    except (json.JSONDecodeError, TypeError, AttributeError):\n",
    "        pass\n",
    "\n",
    "    return tool_calls\n",
    "\n",
    "\n",
    "# Apply the function to each row of trace_df.output.value\n",
    "trace_df[\"tool_call\"] = trace_df[\"outputs\"].apply(\n",
    "    lambda x: extract_tool_calls(x) if isinstance(x, str) else []\n",
    ")\n",
    "\n",
    "# Display the tool calls found\n",
    "print(\"Tool calls found in traces:\", trace_df[\"tool_call\"].sum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Keep only rows where tool_calls is not empty (has at least one tool call)\n",
    "trace_df = trace_df[trace_df[\"tool_call\"].apply(lambda x: len(x) > 0)]\n",
    "\n",
    "trace_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trace_df[\"tool_definitions\"] = (\n",
    "    \"phoenix-traces retrieves the latest trace information from Phoenix, phoenix-experiments retrieves the latest experiment information from Phoenix, phoenix-datasets retrieves the latest dataset information from Phoenix\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rails = list(TOOL_CALLING_PROMPT_RAILS_MAP.values())\n",
    "\n",
    "eval_model = BedrockModel(session=session, model_id=\"anthropic.claude-3-5-haiku-20241022-v1:0\")\n",
    "\n",
    "response_classifications = llm_classify(\n",
    "    data=trace_df,\n",
    "    template=TOOL_CALLING_PROMPT_TEMPLATE,\n",
    "    model=eval_model,\n",
    "    rails=rails,\n",
    "    provide_explanation=True,\n",
    ")\n",
    "response_classifications[\"score\"] = response_classifications.apply(\n",
    "    lambda x: 1 if x[\"label\"] == \"correct\" else 0, axis=1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from phoenix.client import AsyncClient\n",
    "\n",
    "px_client = AsyncClient()\n",
    "await px_client.spans.log_span_annotations_dataframe(\n",
    "    dataframe=response_classifications,\n",
    "    annotation_name=\"Tool Calling Eval\",\n",
    "    annotator_kind=\"LLM\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should now see your evaluation labels in Phoenix!\n",
    "\n",
    "![bedrock-agent-evals-1](https://storage.googleapis.com/arize-phoenix-assets/assets/images/bedrock-agent-evals-1.png)\n",
    "![bedrock-agent-evals-2](https://storage.googleapis.com/arize-phoenix-assets/assets/images/bedrock-agent-evals-2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Next Steps\n",
    "\n",
    "From here, you could look to expand your agent's capabilities in Bedrock by attaching it to your own tools and lambda functions. Or you could expand your testing and experiment flows in Phoenix by checking out [Experiments](https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/run-experiments) or [Prompts](https://arize.com/docs/phoenix/prompt-engineering/overview-prompts).\n",
    "\n",
    "And for more on [Agents](https://arize.com/ai-agents/) and [Evaluation](https://arize.com/llm-evaluation), check out Arize's [website](https://arize.com).\n",
    "\n",
    "We can't wait to see what you'll build!"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
